Method and apparatus for predicting based on multi-source heterogeneous data

ABSTRACT

A method and apparatus for predicting based on multi-source heterogeneous data. The method comprises: acquiring, with regard to an event of a set type, at least two types of historical data that can reflect an event result; establishing a joint likelihood model of attribute data of the event of the set type and the historical data; determining an optimal estimation of the attribute data according to a maximum posterior principle; and determining, based on a probability distribution associated with the attribute data in the joint likelihood model, a parameter in the probability distribution as a prediction result of a predicted event of the set type. Some embodiments use a hierarchical model to introduce data of different sources into different data layers, unify heterogeneous data in a joint likelihood model to perform analysis, and obtain a more accurate, instant and stable prediction result through effective fusion.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority to Chinese Patent ApplicationNo. CN201410427849.7, filed on Aug. 27, 2014, the entire disclosure ofwhich is incorporated herein by reference in its entirety and for allpurposes.

TECHNICAL FIELD

The embodiments of the present invention relate to data processingtechnology, and in particular, to a method and apparatus for predictingbased on multi-source heterogeneous data.

BACKGROUND

In the prior art, a common method of predicting is to predict an eventresult based on historical data and a model. A typical applicationscenario is to predict the results of various matches.

A prediction model based on historical match data mainly estimates theoffensive/defensive capability of a match team by means of analyzingperformances of various teams in historical match data, and predicts amatch result of a subsequent match on this basis.

The defect of the technical solution is mainly as follows: due tomatches of various match teams being sparsely distributed over time,changes in players of the match team and fluctuations in the players'own states, as well as the chance nature of matches, it is verydifficult for a prediction model obtained on this basis to make a goodestimate of instantaneous relative strength of all match teams, with theresult that the prediction has a poor accuracy and is not stable enough.In addition, conditions occurring in a match schedule cannot bereflected promptly. Moreover, there is only one data source and theinformation amount is relatively small, and thus a match result of afuture match cannot be effectively predicted.

SUMMARY

The embodiments of the present invention provide a method and apparatusfor predicting based on multi-source heterogeneous data, so as toimprove the accuracy of prediction.

The embodiments of the present invention provide a method for predictingbased on multi-source heterogeneous data, comprising:

acquiring, with regard to an event of a set type, at least two types ofhistorical data which can reflect an event result;

establishing a joint likelihood model of attribute data of the event ofthe set type and the at least two types of historical data, anddetermining an optimal estimation of the attribute data according to amaximum posterior principle; and

determining, with regard to an event to be predicted which belongs tothe event of the set type, based on a probability distributionassociated with the attribute data in the joint likelihood model, aparameter in the probability distribution as a prediction result of theevent to be predicted.

The embodiments of the present invention also provide an apparatus forpredicting based on multi-source heterogeneous data, comprising:

a data acquisition module for acquiring, with regard to an event of aset type, at least two types of historical data which can reflect anevent result;

a model estimation module for establishing a joint likelihood model ofattribute data of the event of the set type and the at least two typesof historical data, and determining an optimal estimation of theattribute data according to a maximum posterior principle; and

a result prediction module for determining, with regard to an event tobe predicted which is of the set type, based on a probabilitydistribution associated with the attribute data in the joint likelihoodmodel, a parameter in the probability distribution as a predictionresult of the event to be predicted.

The embodiments of the present invention use a hierarchical model tointroduce data of different sources into different data layers, unifyheterogeneous data in a joint likelihood model to perform analysis, andobtain a more accurate, instant and stable prediction result througheffective fusion.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

FIG. 1 is a flowchart of a method for predicting based on multi-sourceheterogeneous data provided by embodiment I of the present invention;

FIG. 2 is a flowchart of a method for predicting based on multi-sourceheterogeneous data provided by embodiment II of the present invention;

FIG. 3 is a schematic diagram illustrating a relationship between amodel and a parameter applicable to embodiment II of the presentinvention; and

FIG. 4 is a structural schematic diagram of an apparatus for predictingbased on multi-source heterogeneous data provided by embodiment III ofthe present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention will be further described in detail below inconjunction with the accompanying drawings and the embodiments. It canbe understood that specific embodiments described herein are merely usedfor explaining the present invention, rather than limiting the presentinvention. Additionally, it also needs to be noted that, for ease ofdescription, the accompanying drawings only show parts related to thepresent invention rather than all the structures.

Embodiment I

FIG. 1 is a flowchart of a method for predicting based on multi-sourceheterogeneous data proposed by embodiment I of the present invention.The embodiment of the present invention is applicable to the predictionof the result of an event, specifically comprising the following:

S110, with regard to an event of a set type, at least two types ofhistorical data which can reflect an event result are acquired.

The event of the set type refers to an event which can be predicted bythe method of the embodiment of the present invention and is generallyan event of which an event result satisfies a certain probabilitydistribution, typically such as a football match and a basketball match.The historical data refers to historical result data of such events. Theevent result may usually be expressed from multiple perspectives, andresult data of each perspective may be viewed as being of one type. Forexample, with regard to a football match, score data may be viewed as atype of data which can reflect a match result, and viewed from thedifferent perspectives of a home team and a visiting team, the score maybe denoted as a home team score and a visiting team score; and odds datamay be viewed as another type of data which can reflect the matchresult, wherein the odds data is generally acquired from organizationssuch as sports lotteries and often embodies an expected value for thematch result of bettors and organizers, and the odds data may comprisehome odds, average odds and visiting odds. Multiple types of historicaldata actually constitute a multi-source heterogeneous data form, unlikethe prior art which is limited to a certain single data source type.

S120, a joint likelihood model of attribute data of the event of the settype and the at least two types of historical data is established, andan optimal estimation of the attribute data is determined according to amaximum posterior principle.

The attribute data refers to an intrinsic attribute of the event.Although it may contain some accidental factors, the attribute data isgenerally static and stable. For example, a match teamoffensive/defensive capability parameter of the match may be viewed asthe attribute data, and although the performance level of a match teammay be effected by weather, illness and injuries, it should be stable ingeneral, and is also an important basis for predicting a match result.The present operation establishes a joint likelihood model of theattribute data and at least two types of historical data of the event,that is, multi-source heterogeneous historical data is comprehensivelytaken into consideration to determine the attribute data of the eventfrom multiple levels.

The operation may specifically be: establishing the joint likelihoodmodel according to a relationship between the at least two types ofhistorical data and the attribute data and a correction function forcorrecting the relationship to satisfy a normal distribution.

The joint likelihood model comprehensively takes the relationshipsbetween various types of historical data and the attribute data intoconsideration, and the relationship between each type of historical dataand the attribute data may be expressed by means of a certaindistribution probability function. The relationship between thehistorical data and the attribute data preferably comprises a Poissondistribution function and/or a gamma distribution function. For example,the relationship between the odds and the offensive/defensive capabilityparameter may be expressed based on the Poisson distribution function,and the relationship between the score and the offensive/defensivecapability parameter may be expressed based on the gamma distributionfunction. Of course, the distribution probability functions are notlimited to these and may also be expressed using other distributionprobability functions which satisfy the event relationships.

On this basis, it is preferred that the joint likelihood model furthercomprises a correction function for correcting the relationship tosatisfy a normal distribution, that is, since relationships in theattribute data and between the attribute data and the match resultgenerally satisfy a normal relationship, and extreme cases are unlikelyto occur, with regard to abnormal relationships resulting from certainaccidental factors or insufficient historical data, excessively largedeviation of the determined attribute data is avoided.

S130, with regard to an event to be predicted which is of the set type,based on a probability distribution associated with the attribute datain the joint likelihood model, a parameter in the probabilitydistribution is determined as a prediction result of the event to bepredicted.

After the attribute data of the event of the set type is determined,with regard to events of the same type, this attribute data may be used,and based on a probability distribution associated with the attributedata in the joint likelihood model, a parameter in the probabilitydistribution is determined. This is actually a reverse process ofdetermining attribute data based on historical data.

The embodiments of the present invention use a hierarchical model tointroduce data of different sources into different data layers, unifyheterogeneous data in a joint likelihood model to perform analysis, andobtain a more accurate, instant and stable prediction result througheffective fusion.

Embodiment II

FIG. 2 is a flowchart of a method for predicting based on multi-sourceheterogeneous data provided by embodiment II of the present invention;and FIG. 3 is a schematic diagram illustrating a relationship between amodel and a parameter applicable to embodiment II of the presentinvention. This embodiment provides a specific solution for execution,and the description is provided with match prediction as an examplespecifically. The method specifically comprises:

S210, with regard to a match, a historical score and historical oddswhich can reflect a match result are acquired as two types of historicaldata;

The present operation takes historical results of multiple matches,wherein each of the multiple matches may be denoted as match m, m beingthe sequence number of the match and the value range of m being 1 to M.The score of each match m is denoted as a home team score s_(m,1) and avisiting team score s_(m,2); and odds of each match m are denoted ashome odds P_(m,1), average odds P_(m,2) and visiting odds P_(m,3).

S220, normalization processing is performed on the historical odds.

The present operation is an optional step, to adapt the odds to aparameter form of a distribution probability function. It is preferredthat the normalization processing may be performed on the odds based ona formula as follows:P′ _(m,1) =P _(m,1)/(P _(m,1) +P _(m,2) +P _(m,3))  (1)P′ _(m,2) =P _(m,2)/(P _(m,1) +P _(m,2) +P _(m,3))  (2)P′ _(m,3) =P _(m,3)/(P _(m,1) +P _(m,2) +P _(m,3))  (3)

P′_(m,1), P′_(m,2), and P′_(m,3) are odds after normalizationprocessing, and for uniform description hereinafter, the odds afternormalization processing are still denoted as P_(m,1), P_(m,2), andP_(m,3).

S230, with regard to a match m, a home team goal parameter λ_(m,1) and avisiting team goal parameter λ_(m,2) of the Poisson distribution of thematch m are determined using home odds P_(m,1), average odds P_(m,2) andvisiting odds P_(m,3) of the match m based on a relational expression asfollows:

$\begin{matrix}\left\{ \begin{matrix}{p_{m,1} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} > {{Poisson}\left( \lambda_{m,2} \right)}} \right)}} \\{p_{m,2} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} = {{Poisson}\left( \lambda_{m,2} \right)}} \right)}} \\{p_{m,3} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} < {{Poisson}\left( \lambda_{m,2} \right)}} \right)}}\end{matrix} \right. & (4)\end{matrix}$

where P( ) is a distribution probability, namely, a probability whichsatisfies the relational expression in the brackets.

Poisson(λ) denotes a Poisson distribution with λ as a parameter, and itmeans that if a random variable X is only valued as a non-negativeinteger 0, 1, 2, . . . , and the probability distribution thereof obeysPoisson(λ). Then the meaning ofP_(m,1)=P(Poisson(λ_(m,1))>Poisson(λ_(m,2))) is that the values of thehome team goal parameter λ_(m,1) and the visiting team goal parameterλ_(m2) are such that the probability ofPoisson(λ_(m,1))>Poisson(λ_(m,2)) is equal to the home odds P_(m,1). Themeanings of the other two formulas are similar, and the home team goalparameter λ_(m,1) and the visiting team goal parameter λ_(m2) shouldsatisfy the above-mentioned three relational expressions.

The home team goal parameters λ_(m,1) and the visiting team goalparameters λ_(m2) of various matches are all determined according to theabove relational expressions. The matches per se are different, forexample, the weather, date, importance of the match, score and odds aredifferent, and therefore even if the match teams participating in thematches are the same, the determined home team goal parameters λ_(m,1)and visiting team goal parameters λ_(m2) are not exactly the same. Thehome team goal parameter λ_(m,1) and the visiting team goal parameterλ_(m2) of each match team determined by participating in the matches areindependent of one another.

S240, a home team goal parameter λ_(m,1) and a visiting team goalparameter λ_(m,2) of each match which are determined based on odds, aswell as a home team score s_(m,2) and a visiting team score s_(m,2) ofeach match are substituted into the following formula to construct ajoint likelihood model as follows regarding a match teamoffensive/defensive capability parameterθ, and an offensive/defensivecapability parameter θof each match team is determined in a maximumposterior manner:log P(θ)=g log P(s|θ)+(1−g)log P(λ|θ)+log N(θ;0,σ₁ ²)+log N(α_(i) −d_(i);0,σ₂ ²)  (5)

-   -   where    -   θ=({α_(i)}_(i=1 . . . n),{d_(j)}j=_(1 . . . n),{b_(k),b_(k′)}_(k,k′=1 . . . k))

θ is an offensive/defensive capability parameter set of each match team.n is the sequence number of a match team, α_(i) is an offensivecapability parameter of a match team i, d_(j) is a defensive capabilityparameter of a match team j, and b_(k) and b_(k′) are state adjustmentparameters of the match which are respectively used for correcting,according to the state of a match, an offensive capability parameter anda defensive capability parameter of a match team when serving as a hometeam and a visiting team. The so-called state adjustment parameter is acorrection coefficient for adjusting the offensive/defensive capabilityparameter of a match team according to the state of the match. Becauseeven if the match teams participating in the match are the same, theoffensive/defensive capability may also change due to the natural stateof the match. For example, the weather conditions during the match,whether the type of the match is a friendly match or World Cup, a leaguematch, etc., and thus b_(k) may be set as a correction coefficient forthe offensive/defensive capability of the home team, and b_(k′) may beset as a correction coefficient for the offensive/defensive capabilityof the visiting team, both of which can be obtained through estimationfrom the model.

P(θ) has the meaning of a distribution probability of θ, i.e. denotesthe probability density of—as a continuously distributed randomvariable.

The first item in the relational expression (5) is as follows:

$\begin{matrix}{{P\left( {s❘\theta} \right)} = {\int_{0}^{\infty}{{P\left( {s❘\lambda} \right)}{P\left( {\lambda ❘\theta} \right)}{\mathbb{d}\lambda}}}} \\{= {\int_{0}^{\infty}{\frac{\lambda^{s}}{s!}{\mathbb{e}}^{- \lambda}\frac{\beta^{\exp{(x)}}}{\Gamma\left( {\exp(x)} \right)}\lambda^{{\exp{(x)}} - 1}{\exp\left( {{- \beta}\;\lambda} \right)}{\mathbb{d}\lambda}}}} \\{= {\frac{\Gamma\left( {{\exp(x)} + s} \right)}{\Gamma\left( {\exp(x)} \right)}\frac{\beta^{\exp{(x)}}}{\left( {\beta + 1} \right)^{{\exp{(x)}} + s}}\frac{1}{s!}}} \\{\int_{0}^{\infty}{\frac{\left( {\beta + 1} \right)^{{\exp{(x)}} + s}}{\Gamma\left( {{\exp(x)} + s} \right)}\lambda^{{\exp{(x)}} + s - 1}{\exp\left( {{- \left( {\beta + 1} \right)}\lambda} \right)}{\mathbb{d}\lambda}}} \\{= {\frac{\Gamma\left( {{\exp(x)} + s} \right)}{\Gamma\left( {\exp(x)} \right)} \cdot \frac{\beta^{\exp{(x)}}}{\left( {\beta + 1} \right)^{{\exp{(x)}} + s}} \cdot \frac{1}{s!}}}\end{matrix}$

When the value of s is s_(m,1), x=b_(k) _(m) +α_(i) _(m) −d_(j) _(m) ;

when the value of s is s_(m,2), x=b_(k′) _(m) +α_(j) _(m) −d_(i) _(m) ;

P(s|θ) is used for denoting the relationship between theoffensive/defensive capability parameter θ and the score s.

The second item in the relational expression (5) is as follows:

${P\left( {\lambda ❘\theta} \right)} = {\frac{\beta^{\exp{(x)}}}{\Gamma\left( {\exp(x)} \right)} \cdot \lambda^{{\exp{(x)}} - 1} \cdot {\exp\left( {{- \beta}\;\lambda} \right)}}$

When the value of λ is λ_(m,1), x=b_(k) _(m) +α_(i) _(m) −d_(j) _(m) ;

when the value of λ is λ_(m,2), x=b_(k′) _(m) +α_(j) _(m) −d_(i) _(m) ;

P(λ|θ) is used for denoting the relationship between theoffensive/defensive capability parameter θ and a goal parameter λ.

g is a preset weight value, and the influence of the score and odds onthe offensive/defensive capability parameter may be adjusted byadjusting the weight value; β is a preset first preset adjustmentparameter value, σ₁ is a preset second adjustment parameter value, andσ₂ is a preset third adjustment parameter value; and the above-mentionedparameter values may all be preset according to experience orexperiments and may also be adjusted according to prediction conditions.

Γ( ) is a gamma function; and log N( ) function is a logarithmic normaldistribution function.

The third item in the relational expression (5) is as follows:

log N (θ; 0, σ₁ ²) is used for correcting a match team with historicaldata lower than a set threshold value, i.e. when there is relativelylittle historical data of the match team, in order to avoid the casewhere the determined match team offensive/defensive capability has alarge deviation due to there being little sample data, this adjustmentitem is set, because the offensive/defensive capability of the matchteam generally satisfies the normal distribution and will not showexcessively large fluctuation in various matches.

The fourth item in the relational expression (5) is as follows:

log N (α_(i)−d_(i); 0, σ₂ ²) is used for correcting the balance ofoffensive/defensive capability of a match team. That is, as regards eachmatch team, there will be no great difference between the offensive anddefensive capabilities thereof which are generally associated, andtherefore this adjustment item is used for correction.

Based on the adjustment item, that is, through adjusting a modelhyper-parameter, the balance of the model between using historical dataand odds data may also be controlled, and a ratio of importance degreesof distant data to instant data may be controlled.

S250, with regard to a match c to be predicted, a home team distributionparameter λ_(c,1) and a visiting team distribution parameter λ_(c,2) ofthe match c to be predicted are determined according tooffensive/defensive capability parameters θ of a home team i and avisiting team j participating in the match based on relationalexpressions as follows:λ_(c,1)˜Gamma(exp(b _(k) _(c) +α _(i) _(c) −d _(j) _(c) ),β)  (6)λ_(c,2)˜Gamma(exp(b _(k′) _(c) +α _(j) _(c) −d _(i) _(c) ),β)  (7)

where b_(k) _(c) is a home team correction coefficient determinedaccording to the match c to be predicted, α_(i) _(c) , is an offensivecapability parameter of the home team i, d_(j) _(c) is a defensivecapability parameter of the visiting team j, b_(k′) _(c) is a visitingteam correction coefficient determined according to the match c to bepredicted, α_(j) _(c) is an offensive capability parameter of thevisiting team j, and is d_(i) _(c) defensive capability parameter of thehome team.

S260, with regard to the match c to be predicted, odds P_(c,1), P_(c,2)and P_(c,3) are determined according to the home team distributionparameter λ_(c,1) and the visiting team distribution parameter λ_(c,2)based on relational expressions as follows:

$\begin{matrix}\left\{ \begin{matrix}{p_{c,1} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} > {{Poisson}\left( \lambda_{c,2} \right)}} \right)}} \\{p_{c,2} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} = {{Poisson}\left( \lambda_{c,2} \right)}} \right)}} \\{p_{c,3} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} < {{Poisson}\left( \lambda_{c,2} \right)}} \right)}}\end{matrix} \right. & (8)\end{matrix}$

S270, with regard to the match c to be predicted, scores S_(c,1) andS_(c,2) are determined according to the home team distribution parameterλ_(c,1) and the visiting team distribution parameter λ_(c,2) based on arelational expression as follows:S _(c,1)˜Poisson(λ_(c,1))  (9)S _(c,2)˜Poisson(λ_(c,2))  (10)

The technical solution of the embodiments of the present inventionperforms modeling analysis on the offensive/defensive capabilityparameter of a match team, and performs simulated calculation ofpossible match results of future matches on this basis. With regard to amatch result of each match, two Poisson distributions are used toperform modeling, and a model relationship from scores to Poissondistribution parameters and a model relationship from odds to thePoisson distribution parameters are established in sequence;furthermore, the offensive/defensive capability parameter is used toperform modeling on the distribution of the Poisson distributionparameters; two types of data are integrated using a model having a(deep) hierarchical structure; and finally, probability estimations ofpossible results of future matches and other outputs are providedaccording to model results. The distribution of scores is depicted usingtwo Poisson distributions, and meanwhile, parameter values of thePoisson model of the distribution of scores are calculated backward fromodds data; a gamma distribution is used to depict the distribution ofPoisson model parameter values; and the gamma distribution parametersare associated with the offensive/defensive capability parameters of thehome and visiting match teams and other features; Poisson distributionparameters obtained from two aspects are synthesized to establish ajoint likelihood of the offensive/defensive capability of the team withmatch results and odds data.

The technical solution of the embodiment of the present invention mayeffectively utilize data of different structures from different sources,and with respect to the prior art, a majority of the existing footballprediction models only consider using historical score data, without anymulti-source data fusion content. However, in the embodiment of thepresent invention, through integrating multi-source heterogeneous data,the solution can obtain better accuracy, quicker time-effectiveness andsuperior stability.

Embodiment III

FIG. 4 is a structural schematic diagram of an apparatus for predictingbased on multi-source heterogeneous data provided by embodiment III ofthe present invention. The apparatus comprises: a data acquisitionmodule 410, a model estimation module 420 and a result prediction module430. The data acquisition module 410 is used for acquiring, with regardto an event of a set type, at least two types of historical data whichcan reflect an event result; the model estimation module 420 is used forestablishing a joint likelihood model of attribute data of the event ofthe set type and the at least two types of historical data, anddetermining an optimal estimation of the attribute data according to amaximum posterior principle; and the result prediction module 430 isused for determining, with regard to an event to be predicted which isof the set type, based on a probability distribution associated with theattribute data in the joint likelihood model, a parameter in theprobability distribution as a prediction result of the event to bepredicted.

In the above-mentioned technical solution, the model estimation module420 is specifically used for: establishing the joint likelihood modelaccording to a relationship between the at least two types of historicaldata and the attribute data and a correction function for correcting therelationship to satisfy a normal distribution.

The relationship between the historical data and the attribute datapreferably comprises a Poisson distribution function and/or a gammadistribution function.

A preferred example provided based on the above-mentioned technicalsolution is:

the data acquisition module 410 is specifically used for acquiring, withregard to a match, a historical score and historical odds which canreflect a match result as two types of historical data;

the model estimation module 420 is specifically used for:

with regard to a match m, determining a home team goal parameter λ_(m,1)and a visiting team goal parameter λ_(m2) of the Poisson distribution ofthe match m using home odds P_(m,1), average odds P_(m,2) and visitingodds P_(m,3) of the match m based on a relational expression as follows:

$\left\{ {\begin{matrix}{p_{m,1} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} > {{Poisson}\left( \lambda_{m,2} \right)}} \right)}} \\{p_{m,2} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} - {{Poisson}\left( \lambda_{m,2} \right)}} \right)}} \\{p_{m,3} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} < {{Poisson}\left( \lambda_{m,2} \right)}} \right)}}\end{matrix}\quad} \right.$

where P( ) is a distribution probability; m is the sequence number ofthe match, and the value range of m is 1 to M;

substituting a home team goal parameter λ_(m,1) and a visiting team goalparameter λ_(m,2) of each match which are determined based on odds, aswell as a home team score s_(m,1) and a visiting team score s_(m,2) ofeach match into the following formula to construct a joint likelihoodmodel as follows regarding a match team offensive/defensive capabilityparameter θ, and determining an offensive/defensive capability parameterθ of each match team in a maximum posterior manner:log P(θ)=g log P(s|θ)+(1−g)log P(λ|θ)+log N(θ;0,σ₁ ²)+log N(α_(i) −d_(i);0,σ₂ ²)

-   -   where    -   θ=({α_(i)}i=_(1 . . . n),{d_(j)}_(j=1 . . . n),{b_(k),b_(k′)}_(k,k′=1 . . . K))

n is the sequence number of a match team, α_(i) is an offensivecapability parameter of a match team i, d_(j) is a defensive capabilityparameter of a match team j, and b_(k) and b_(k′) are state adjustmentparameters of the match which are respectively used for correcting,according to the state of a match, an offensive capability parameter anda defensive capability parameter of a match team when serving as a hometeam and a visiting team;

the meaning of P(θ) is a distribution probability of θ;

${P\left( {s❘\theta} \right)} = {\frac{\Gamma\left( {{\exp(x)} + s} \right)}{\Gamma\left( {\exp(x)} \right)} \cdot \frac{\beta^{\exp{(x)}}}{\left( {\beta + 1} \right)^{{\exp{(x)}} + s}} \cdot \frac{1}{s!}}$

When the value of s is S_(m,1)=x=b_(k) _(m) +α_(i) _(m) −d_(j) _(m) ;

when the value of s is S_(m,2), x=b_(k′) _(m) +α_(j) _(m) −d_(i) _(m) ;

${P\left( {\lambda ❘\theta} \right)} = {\frac{\beta^{\exp{(x)}}}{\Gamma\left( {\exp(x)} \right)} \cdot \lambda^{{\exp{(x)}} - 1} \cdot {\exp\left( {{- \beta}\;\lambda} \right)}}$

When the value of λ is λ_(m,1), x=b_(k) _(m) +α_(i) _(m) −d_(j) _(m) ;

when the value λ of is λ_(m2), x=b_(k′) _(m) +α_(j) _(m) −d_(i) _(m) ;

g is a preset weight value, β is a preset first preset adjustmentparameter value, σ₁ is a preset second adjustment parameter value, andσ₂ is a preset third adjustment parameter value;

Γ( ) is a gamma function;

log N ( ) function is a logarithmic normal distribution function;

log N(θ; 0, σ₁ ²) is used for correcting a match team with historicaldata lower than a set threshold value; and

log N(α_(i)-d_(i);0,σ₂ ²) is used for correcting the balance ofoffensive/defensive capability of a match team.

The result prediction module 430 is specifically used for:

with regard to a match c to be predicted, determining a home teamdistribution parameter λ_(c,1) and a visiting team distributionparameter λ_(c,2) of the match c to be predicted according tooffensive/defensive capability parameters θ of a home team i and avisiting team j participating in the match based on relationalexpressions as follows:λ_(c,1)˜Gamma(exp(b _(k) _(c) +α _(i) _(c) −d _(j) _(c) ),β)λ_(c,2)˜Gamma(exp(b _(k′) _(c) +α _(j) _(c) −d _(i) _(c) ),β)

where b_(k) _(c) is a home team correction coefficient determinedaccording to the match c to be predicted, α_(i) _(c) is an offensivecapability of the home team i, d_(j) _(c) is a defensive capability ofthe visiting team j, b_(k′) _(c) is a visiting team correctioncoefficient determined according to the match c to be predicted, α_(j)_(c) is an offensive capability of the visiting team j, and d_(i) _(c)is a defensive capability of the home team;

with regard to the match c to be predicted, determining odds P_(c,1),P_(c,2) and P_(c,3) according to the home team distribution parameterλ_(c,1) and the visiting team distribution parameter λ_(c,2) based on arelational expression as follows:

$\left\{ {\begin{matrix}{p_{c,1} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} > {{Poisson}\left( \lambda_{c,2} \right)}} \right)}} \\{p_{c,2} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} = {{Poisson}\left( \lambda_{c,2} \right)}} \right)}} \\{p_{c,3} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} < {{Poisson}\left( \lambda_{c,2} \right)}} \right)}}\end{matrix}\quad} \right.$

with regard to the match c to be predicted, determining scores S_(c,1)and S_(c,2) according to the home team distribution parameter λ_(c,1)and the visiting team distribution parameter λ_(c,2) based on relationalexpressions as follows:S _(c,1)˜Poisson(λ_(c1))S _(c,2)˜Poisson(λ_(c2)).

The model estimation module 430 is also specifically used for performingnormalization processing on the historical odds after acquiring, withregard to a match, a historical score and historical odds which canreflect a match result as two types of historical data.

The apparatus for predicting based on multi-source heterogeneous dataprovided by the embodiments of the present invention is used forcarrying out the method for predicting based on multi-sourceheterogeneous data provided by the embodiments of the present invention,can carry out corresponding operations and has corresponding functionsand beneficial effects.

It should be noted that the above are merely preferred embodiments andapplied technical principles of the present invention. Those of skill inthe art will understand that the present invention is not limited to theparticular embodiments described herein, and for those of skill in theart, various obvious modifications, readjustments and substitutions canbe carried out without deviating from the scope of protection of thepresent invention. Therefore, although the present invention isdescribed in detail through the above embodiments, the present inventionis not merely limited to the above embodiments; other equivalentembodiments may also be included without deviating from the concept ofthe present invention, and the scope of the present invention isdetermined by the scope of the appended claims.

What is claimed is:
 1. A computer-implemented method for eventpredicting via machine learning based on multi-source heterogeneousdata, comprising: acquiring at least two types of historical dataassociated with an event result for a first event of a predeterminedtype; establishing a joint likelihood model of attribute data of thefirst event and the at least two types of historical data; determiningan optimal estimation of the attribute data based on the jointlikelihood model according to a maximum posterior principle; anddetermining a parameter in a probability distribution as a predictionresult of a second event based on the probability distributionassociated with the attribute data in the joint likelihood model,wherein the joint likelihood model includes one or more adjustmentparameters for correcting the joint likelihood model, the adjustmentparameters being determined iteratively based on an accuracy of theprediction result.
 2. The method of claim 1, wherein said determiningthe parameter comprises determining the parameter in the probabilitydistribution as the prediction result of an event to be predicted basedon the probability distribution.
 3. The method of claim 2, wherein apreselected type of the second event and the predetermined type of thefirst event are identical.
 4. The method of claim 1, wherein saidestablishing comprises establishing the joint likelihood model accordingto a relationship between the at least two types of historical data andthe attribute data and a correction function for correcting therelationship to satisfy a normal distribution.
 5. The method of claim 4,wherein the relationship between the at least two types of historicaldata and the attribute data comprises a Poisson distribution function.6. The method of claim 4, wherein the relationship between the at leasttwo types of historical data and the attribute data comprises a gammadistribution function.
 7. The method of claim 1, wherein the one or moreadjustment parameters include one or more state adjustment parameters, apreset weight value, a first preset adjustment parameter value, a secondpreset adjustment parameter value, a third preset adjustment parametervalue, or a combination thereof, the method further comprising:acquiring, with regard to a match, a historical score and historicalodds that can reflect a match result as the two types of historicaldata; with regard to a match m, determining a home team goal parameterλ_(m,1) and a visiting team goal parameter λ_(m2) of a Poissondistribution of the match m using home odds p_(m,1), average oddsp_(m,2) and visiting odds p_(m,3) of the match m based on a relationalexpression as follows: $\left\{ {\begin{matrix}{p_{m,1} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} > {{Poisson}\left( \lambda_{m,2} \right)}} \right)}} \\{p_{m,2} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} = {{Poisson}\left( \lambda_{m,2} \right)}} \right)}} \\{p_{m,3} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} < {{Poisson}\left( \lambda_{m,2} \right)}} \right)}}\end{matrix}\quad} \right.$ wherein P( ) is a distribution probability;m is the sequence number of the match, and the value range of m is 1 toM; and substituting the home team goal parameter λ_(m,1) and thevisiting team goal parameter λ_(m,2)of each match that are determinedbased on odds, as well as a home team score s_(m,1) and a visiting teamscore s_(m,2) of each match into the following formula to construct ajoint likelihood model as follows regarding a match teamoffensive/defensive capability parameter θ, and determining anoffensive/defensive capability parameter θ of each match team in amaximum posterior manner:log P(θ)=g log P(s|θ)+(1−g)log P(λ|θ)+log N(θ;0,σ₁ ²)+log N(α _(i) −d_(i);0,σ₂ ²) whereinθ=({α _(i)}_(i=1 . . . n) ,{d _(j)}_(j=1 . . . n) ,{b _(k) ,b_(k ′)}_(k,k′=1 . . . k)) n is a sequence number of a match team, α_(i)is an offensive capability parameter of a match team i, d_(j) is adefensive capability parameter of a match team j, and b_(k) and b_(k′)are the state adjustment parameters of the match that are respectivelyused for correcting, according to the state of a match, an offensivecapability parameter and a defensive capability parameter of a matchteam when serving as a home team and a visiting team; P(θ) is adistribution probability of θ;${P\left( {s❘\theta} \right)} = {\frac{\Gamma\left( {{\exp(x)} + s} \right)}{\Gamma\left( {\exp(x)} \right)} \cdot \frac{\beta^{\exp{(x)}}}{\left( {\beta + 1} \right)^{{\exp{(x)}} + s}} \cdot \frac{1}{s!}}$wherein the value of s is s_(m,1), x=b_(k) _(m) +α_(i) _(m) −d_(j) _(m); wherein the value of s is s_(m,2), x=b_(k′) _(m) +α_(j) _(m) −d_(i)_(m) ;${P\left( {\lambda ❘\theta} \right)} = {\frac{\beta^{\exp{(x)}}}{\Gamma\left( {\exp(x)} \right)} \cdot \lambda^{{\exp{(x)}} - 1} \cdot {\exp\left( {{- \beta}\;\lambda} \right)}}$wherein the value of λis λ_(m,1), x=b_(k) _(m) +α_(i) _(m−d) _(j) _(m) ;and wherein the value of λis λ_(m2), x=b_(k′) _(m) +α_(j) _(m−d) _(i)_(m) ; wherein g is the preset weight value, β is the preset firstpreset adjustment parameter value, σ₁ is the preset second adjustmentparameter value, and σ₂ is the preset third adjustment parameter value;Γ( ) is a gamma function; logN( ) function is a logarithmic normaldistribution function; logN(θ; 0, σ₁ ²) is used for correcting a matchteam with historical data lower than a set threshold value; andlogN(α_(i)−d_(i); 0, σ₂ ²) is used for correcting the balance of theoffensive/defensive capability of a match team.
 8. The method of claim7, wherein said determining the parameter in the probabilitydistribution comprises: with regard to a match c to be predicted,determining a home team distribution parameter λ_(c,1) and a visitingteam distribution parameter λ_(c,2) of the match c to be predictedaccording to offensive/defensive capability parameters θ of a home teami and a visiting team j participating in the match based on relationalexpressions as follows:λ_(c,1)˜Gamma(exp(b _(k) _(c) +α _(i) _(c) −d _(j) _(c) ), β)λ_(c,2)˜Gamma(exp(b _(k′) _(c) +α _(j) _(c) −d _(i) _(c) ), β) whereb_(k) _(c) is a home team correction coefficient determined according tothe match c to be predicted, α_(i) _(c) is an offensive capabilityparameter of the home team i, d_(j) _(c) is a defensive capabilityparameter of the visiting team j, b_(k′) _(c) is a visiting teamcorrection coefficient determined according to the match c to bepredicted, α_(j) _(c) is an offensive capability parameter of thevisiting team j, and d_(i) _(c) is a defensive capability parameter ofthe home team; with regard to the match c to be predicted, determiningodds p_(c,1), p_(c,2) and p_(c,3) according to the home teamdistribution parameter λ_(c,1) and visiting team distribution parameterλ_(c,2) based on a relational expression as follows:$\left\{ {\begin{matrix}{p_{c,1} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} > {{Poisson}\left( \lambda_{c,2} \right)}} \right)}} \\{p_{c,2} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} = {{Poisson}\left( \lambda_{c,2} \right)}} \right)}} \\{p_{c,3} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} < {{Poisson}\left( \lambda_{c,2} \right)}} \right)}}\end{matrix}{\quad;}} \right.$ and with regard to the match c to bepredicted, determining scores and S_(c,1) and S_(c,2) according to thehome team distribution parameter λ_(c,1) and visiting team distributionparameter λ_(c,2) based on relational expressions as follows:s _(c,l)˜Poisson(λ_(c,1))s _(c,2)˜Poisson(λ_(c,2)).
 9. The method of claim 7, further comprisingperforming normalization processing on the historical odds.
 10. Themethod of claim 9, wherein said performing the normalization processingoccurs after said after acquiring, with regard to the match, thehistorical score and historical odds.
 11. An apparatus for eventpredicting via machine learning based on multi-source heterogeneousdata, comprising: a processor; and a memory having one or more programsstored thereon for instructing said processor, the one or more programsincluding: instruction for acquiring, with regard to an event of a settype, at least two types of historical data that can reflect an eventresult; instruction for establishing a joint likelihood model ofattribute data of the event of the set type and the at least two typesof historical data and determining an optimal estimation of theattribute data based on the joint likelihood model according to amaximum posterior principle; and instruction for determining, withregard to an event to be predicted which is of the set type, based on aprobability distribution associated with the attribute data in the jointlikelihood model, a parameter in the probability distribution as aprediction result of the event to be predicted, wherein the jointlikelihood model includes one or more adjustment parameters forcorrecting the joint likelihood model, the adjustment parameters beingdetermined iteratively based on an accuracy of the prediction result.12. The apparatus of claim 11, wherein the one or more programs includeinstruction for establishing the joint likelihood model according to arelationship between the at least two types of historical data and theattribute data and a correction function for correcting the relationshipto satisfy a normal distribution.
 13. The apparatus of claim 12, whereinthe relationship between the historical data and the attribute datacomprises a Poisson distribution function.
 14. The apparatus of claim12, wherein the relationship between the historical data and theattribute data comprises a gamma distribution function.
 15. Theapparatus of claim 11, wherein the one or more adjustment parametersinclude one or more state adjustment parameters, a preset weight value,a first preset adjustment parameter value, a second preset adjustmentparameter value, a third preset adjustment parameter value, or acombination thereof, wherein the one or more programs include:instruction for acquiring, with regard to a match, a historical scoreand historical odds which can reflect a match result as two types ofhistorical data; instruction for: with regard to a match m, determininga home team goal parameter λ_(m,1) and a visiting team goal parameterλ_(m,2) of the Poisson distribution of said match m using home oddsp_(m,1), average odds p_(m,2) and visiting odds p_(m,3) of said match mbased on a relational expression as follows: $\left\{ {\begin{matrix}{p_{m,1} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} > {{Poisson}\left( \lambda_{m,2} \right)}} \right)}} \\{p_{m,2} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} = {{Poisson}\left( \lambda_{m,2} \right)}} \right)}} \\{p_{m,3} = {P\left( {{{Poisson}\left( \lambda_{m,1} \right)} < {{Poisson}\left( \lambda_{m,2} \right)}} \right)}}\end{matrix}\quad} \right.$ where P( ) is a distribution probability; mis a sequence number of the match, and a value range of m is 1 to M; andinstruction for substituting the home team goal parameter λ_(m,1) andthe visiting team goal parameter λ_(m,2) of each match which aredetermined based on odds, as well as a home team score s_(m,1) and avisiting team score s_(m,2) of each match into the following formula toconstruct a joint likelihood model as follows regarding a match teamoffensive/defensive capability parameter θ, and determining anoffensive/defensive capability parameter θ of each match team in amaximum posterior manner:log P(θ)=g log P(s|θ)+(1−g)log P(θ|θ)+log N(θ;0,σ₁ ²)+log N(α _(i) −d_(i);0,σ₂ ²) whereθ=({α _(i)}_(i=1 . . . n),{d _(j)}_(j=1 . . . n),{b _(k) ,b_(k′)}_(k,k′=1 . . . k)) n is the sequence number of a match team, α_(i)is an offensive capability parameter of a match team i, d_(j) is adefensive capability parameter of a match team j, and b_(k) andb_(k′)are the state adjustment parameters of the match which arerespectively used for correcting, according to the state of a match, anoffensive capability parameter and a defensive capability parameter of amatch team when serving as a home team and a visiting team; P(θ) is adistribution probability of θ;${P\left( {s❘\theta} \right)} = {\frac{\Gamma\left( {{\exp(x)} + s} \right)}{\Gamma\left( {\exp(x)} \right)} \cdot \frac{\beta^{\exp{(x)}}}{\left( {\beta + 1} \right)^{{\exp{(x)}} + s}} \cdot \frac{1}{s!}}$when the value of s is s_(m,1), x=b_(k) _(m) +α_(i) _(m) −d_(j) _(m) ;when the value of s is s_(m,2), x=b_(k′) _(m) +α_(j) _(m−d) _(i) _(m) ;${P\left( {\lambda ❘\theta} \right)} = {\frac{\beta^{\exp{(x)}}}{\Gamma\left( {\exp(x)} \right)} \cdot \lambda^{{\exp{(x)}} - 1} \cdot {\exp\left( {{- \beta}\;\lambda} \right)}}$when the value of λis λ_(m,1), x=b_(k) _(m) +α_(i) _(m) −d_(j) _(m) ;when the value of λis λ_(m,2), x=b_(k′) _(m) +α_(j) _(m) −d_(i) _(m) ; gis the preset weight value, β is the preset first preset adjustmentparameter value, σ₁ is the preset second adjustment parameter value, andσ₂ is the preset third adjustment parameter value; Γ( ) is a gammafunction; logN( ) function is a logarithmic normal distributionfunction; logN(θ; 0, σ₁ ²) is used for correcting a match team withhistorical data lower than a set threshold value; and logN(α_(i)−d_(i);0, σ₂ ²) is used for correcting the balance of the offensive/defensivecapability of a match team.
 16. The apparatus of claim 14, wherein theone or more programs include: instruction for, with regard to a match cto be predicted, determining a home team distribution parameter λ_(c,1)and a visiting team distribution parameter λ_(c,2) of said match c to bepredicted according to offensive/defensive capability parameters θ of ahome team i and a visiting team j participating in the match based onrelational expressions as follows:θ_(c,1)˜Gamma(exp(b _(k) _(c) +α _(i) _(c) −d _(j) _(c) ),β)θ_(c,2)˜Gamma(exp(b _(k′) _(c) +α _(j) _(c) −d _(i) _(c) ),β) whereb_(k) _(c) is a home team correction coefficient determined according tosaid match c to be predicted, α_(i) _(c) is an offensive capability ofthe home team i, d_(j) _(c) is a defensive capability of the visitingteam j, b_(k′) _(c) is a visiting team correction coefficient determinedaccording to said match c to be predicted, α_(j) _(c) is an offensivecapability of the visiting team j, and d_(i) _(c) is a defensivecapability of the home team; instruction for, with regard to the match cto be predicted, determining odds p_(c,1), p_(c,2) and p_(c,3) accordingto said home team distribution parameter λ_(c,1) and visiting teamdistribution parameter λ_(c,2) based on a relational expression asfollows: $\left\{ {\begin{matrix}{p_{c,1} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} > {{Poisson}\left( \lambda_{c,2} \right)}} \right)}} \\{p_{c,2} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} = {{Poisson}\left( \lambda_{c,2} \right)}} \right)}} \\{p_{c,3} = {P\left( {{{Poisson}\left( \lambda_{c,1} \right)} < {{Poisson}\left( \lambda_{c,2} \right)}} \right)}}\end{matrix}{\quad;}} \right.$ and instruction for, with regard to thematch c to be predicted, determining scores s_(c,1) and s_(c,2)according to said home team distribution parameter λ_(c,1) and visitingteam distribution parameter λ_(c,2) based on relational expressions asfollows:s _(c,1)˜Poisson(λ_(c,1))s _(c,2)˜Poisson(λ_(c,2)).
 17. The apparatus of claim 14, wherein theone or more programs include instruction for performing normalizationprocessing on the historical odds after acquiring, with regard to amatch, a historical score and historical odds which can reflect a matchresult as two types of historical data.
 18. A non-transitory computerstorage medium including at least one program for event predicting viamachine learning based on multi-source heterogeneous data whenimplemented by a processor, comprising: instruction for acquiring atleast two types of historical data associated with an event result for afirst event of a predetermined type; instruction for establishing ajoint likelihood model of attribute data of the first event and the atleast two types of historical data; instruction for determining anoptimal estimation of the attribute data based on the joint likelihoodmodel according to a maximum posterior principle; and instruction fordetermining a parameter in a probability distribution as a predictionresult of a second event based on the probability distributionassociated with the attribute data in the joint likelihood model, thesecond event to be predicted based on the probability distribution andhaving a preselected type of the second event that is identical to thepredetermined type of the first event, wherein the joint likelihoodmodel includes one or more adjustment parameters for correcting thejoint likelihood model, the adjustment parameters being determinediteratively based on an accuracy of the prediction result.
 19. Thecomputer storage medium of claim 18, wherein said instruction forestablishing comprises instruction for establishing the joint likelihoodmodel according to a relationship between the at least two types ofhistorical data and the attribute data and a correction function forcorrecting the relationship to satisfy a normal distribution.
 20. Thecomputer storage medium of claim 19, wherein the relationship betweenthe at least two types of historical data and the attribute datacomprises at least one of a Poisson distribution function and a gammadistribution function.